D1.3 decode-kernel + residual composition (Phase 1 scaffold complete, 104/104 tests)#235
Conversation
Final Phase 1 scaffold deliverable. Hydration/calibration path, NOT
cascade inference path (per cognitive-shader-architecture.md line 582:
the cascade uses p64_bridge::CognitiveShader::cascade + 8 predicate
planes × bgz17 O(1) distance, no per-inference codec work).
crates/cognitive-shader-driver/src/decode_kernel.rs (~280 LOC):
DecodeKernel trait — object-safe, Send+Sync+Debug:
decode(&self, &[u8]) -> Result<Vec<f32>, DecodeError>
encode(&self, &[f32]) -> Result<Vec<u8>, DecodeError>
bytes_per_row() -> u32
dim() -> u32
signature() -> u64 # JIT cache key
backend() -> &'static str # never "scalar" on SoA
StubDecodeKernel { dim, tag } — byte-exact f32 ↔ u8 round-trip via
native-endian reinterpret. No quantization, no compression; exists
so composition plumbing can be tested without a trained palette.
Backend = "stub". Signature hashes "stub_decode" + dim + tag.
ResidualComposer { base: Box<dyn DecodeKernel>, residual: Box<dyn DecodeKernel> }
Two-stage residual composition:
encode(v) = [base.encode(v); residual.encode(v - base.decode(base.encode(v)))]
decode(enc) = base.decode(enc[..base_b]) + residual.decode(enc[base_b..])
Nests recursively — residual slot can itself be a ResidualComposer
(depth > 1). Rejects mismatched dims at construction.
Backend = "stub" if either stage is stub, else base's backend
(weakest-link reporting for latency-critical stages).
DecodeError { SizeMismatch { expected, actual }, Stage { stage, detail } }
Tests (9 new, all under --features serve):
- stub_round_trip_is_exact
- stub_rejects_wrong_input_size
- residual_compose_round_trip_is_exact_when_both_stubs
(both stubs = byte-exact; residual all zero; output == input)
- residual_compose_mismatched_dims_rejected
- residual_compose_bytes_per_row_sums_stages
- residual_compose_nested_depth_two_round_trip
(ResidualComposer whose residual IS another ResidualComposer —
depth=2 encodes 3 stages, still byte-exact when all stubs)
- signatures_distinguish_composer_from_stages
- signature_depends_on_stage_order
(base+residual vs residual+base produce different signatures)
- composer_backend_reports_stub_when_any_stage_is_stub
Scope clarification (per orientation loaded this session from
cognitive-shader-architecture.md + ripple-dto-contracts.md):
- D1.x codec kernels = hydration/calibration path
- Cascade inference path = p64_bridge::CognitiveShader at L2
- Real kernels replace StubDecodeKernel once D1.1b lands the
ndarray::hpc::jitson_cranelift::JitEngine adapter
Board hygiene (CLAUDE.md Mandatory rule):
STATUS_BOARD.md D1.3 Queued → In PR
Rules honored:
Rule A — in-place &mut operations via Vec; no manual index math
Rule B — no std::arch / no hpc::simd_avxNNN reach
Rule C — n/a at the composition layer (real kernel backend selection
defers to D1.1b per-stage)
Rule D — codec params come from CodecParams via Wire DTOs (D0.1-D0.3)
Rule E — trait methods expose signature + backend + bytes_per_row
Rule F — no serialization between stages; Vec<f32>/Vec<u8> owned
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3f58967902
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for (dst, &r) in out.iter_mut().zip(&residual_v) { | ||
| *dst += r; |
There was a problem hiding this comment.
Reject stage decode length mismatches before summing
ResidualComposer::decode adds stage outputs with zip, which truncates to the shorter vector. If either stage returns fewer than dim() elements, this returns Ok(...) with silently corrupted output instead of surfacing an error, even though the trait contract says decode should produce full-dimension vectors. Add explicit length checks for both decoded stage vectors before the accumulation loop so malformed/buggy stage implementations fail fast.
Useful? React with 👍 / 👎.
Summary
Final Phase 1 scaffold deliverable — D1.3 decode-kernel trait + residual composition. Scope corrected after loading the canonical architecture docs (
cognitive-shader-architecture.md+ripple-dto-contracts.md+encoding-ecosystem.md): this module sits on the hydration / calibration path, NOT the cascade inference path (which usesp64_bridge::CognitiveShader::cascadewith 8 predicate planes × bgz17 O(1) distance — no per-inference codec work).104/104
cognitive-shader-driver --features servetests pass (+9 new).What lands
crates/cognitive-shader-driver/src/decode_kernel.rs— ~280 LOC:Composition semantics (matches plan D1.3 spec):
Stages compose recursively — the
residualslot is itselfBox<dyn DecodeKernel>, so a depth-2 composer has anotherResidualComposerin its residual slot. Tests verify byte-exact round-trip through nested-depth-2 all-stub composition (3 stages, 4 dim × 4 bytes × 3 = 48 bytes per row).Tests (9 new)
stub_round_trip_is_exactstub_rejects_wrong_input_sizeresidual_compose_round_trip_is_exact_when_both_stubsresidual_compose_mismatched_dims_rejectedresidual_compose_bytes_per_row_sums_stagesresidual_compose_nested_depth_two_round_tripsignatures_distinguish_composer_from_stagessignature_depends_on_stage_order(base+residual ≠ residual+base — order is part of identity)composer_backend_reports_stub_when_any_stage_is_stub(weakest-link reporting)Scope correction (per loaded orientation)
Before this PR, my framing of "codec kernels" drifted toward treating them as inference-path infrastructure. Reading
cognitive-shader-architecture.mdlines 582+ made the distinction explicit:p64_bridge::CognitiveShader::cascade(query, radius, layer_mask)— 8 predicate planes × bgz17 O(1) palette distance, no Hamming, no POPCNT, table lookup onlyStubDecodeKernelis the test fixture; real decoders (once D1.1b lands the ndarrayjitson_cranelift::JitEngineadapter) replace it. The composition pattern remains stable across that transition.Phase 1 state after merge
CodecKernelCache<H>scaffoldndarray::hpc::jitson_cranelift::JitEngineAfter merge, Phase 1 scaffold is complete. D1.1b (real Cranelift wiring) is the only remaining Phase 1 piece, and it drops
Box<dyn DecodeKernel>kernels that wrap ndarray's JitEngine into theStubDecodeKernelslot inResidualComposer— no composition-layer changes required.Board hygiene (same commit)
STATUS_BOARD.md— D1.3 Queued → In PR.Test Plan
cargo test --manifest-path crates/cognitive-shader-driver/Cargo.toml --features serve— 104/104 pass (+9 new)cargo test -p lance-graph-contract --lib— 147/147 pass (unchanged)cargo test --manifest-path crates/jc/Cargo.toml— 6/6 pass (JC substrate proof unchanged)https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh